Antipattern: Solve a Complex Problem in One Step

Let's look into the drawbacks of retrieving results in one go in detail.

SQL is a very expressive language — you can accomplish a lot in a single query or statement. But that doesn’t mean it’s mandatory or even a good idea to aim to solve every problem in one line of code. Do we have this habit with any other programming language we use? Probably not.

Unintended products#

One common consequence of producing all our results in one query is something called a Cartesian product. This happens when two of the tables in the query have no condition restricting their relationship. Without such a restriction, the JOIN for the two tables pairs each row in the first table to every row in the other table. Each such pairing becomes a row of the result set, and we end up with many more rows than we expected.

Let’s see an example. Suppose we want to query our bugs database to count the number of bugs, fixed and open, for a given product. Many programmers would try to use a query like the following to calculate these counts:

Retrieving all the results with a single query

In reality, we happen to know that there are eleven fixed bugs and seven open bugs for the given product. So, the result of the query is puzzling:

product_id count_fixed count_open
1 77 77

What caused this to be so inaccurate? It’s no coincidence that 77 is 11 times 7. This example joins the Products table to two different subsets of Bugs, but this results in a Cartesian product between those two sets of bugs. Each of the twelve rows for FIXED bugs is paired with all seven rows for OPEN bugs.

We can visualize the Cartesian product graphically below. Each line connecting a fixed bug to an open bug becomes a row in the interim result set (before grouping is applied).

Cartesian product between fixed and open bugs

We can see this interim result set by eliminating the GROUP BY clause and aggregate functions.

Retrieving results using a single query

The only relationships expressed in this query are between the BugsProducts table and each subset of Bugs. No conditions restrict every FIXED bug from matching with every OPEN bug, and the default result is that they do. The result produces twelve times seven rows.

It’s all too easy to produce an unintentional Cartesian product when we try to make a query do double-duty like this. If we try to do more unrelated tasks with a single query, the total could be multiplied by yet another Cartesian product.

As though that weren’t enough#

Besides the fact that we can get the wrong results, it’s important to consider that these queries are simply hard to write, hard to modify, and hard to debug. We should expect to get regular requests for incremental enhancements to our database applications. Managers want more complex reports and more fields in a user interface. If we design intricate, monolithic SQL queries, it’s more costly and time-consuming to make enhancements to them. After all, our time is worth something, both to us and to our projects.

There are runtime costs, too. An elaborate SQL query that uses many JOIN operations, correlated subqueries, and other operations is harder for the SQL engine to optimize and execute quickly than a more straightforward query. Programmers instinctively feel that the fewer the SQL queries, the better the performance. This is true, assuming that the SQL queries in question are of equal complexity. On the other hand, the cost of a single monster query can become extremely large and it can be much more economical to use several simpler queries, instead.

Synopsis: Spaghetti Query
Solution: Divide and Conquer
Mark as Completed
Report an Issue